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Abstract 

The signature of a path is an essential object in the theory of rough paths. 
The signature representation of the data stream can recover standard statistics, e.g. 
the moments of the data stream. The classification of random walks indicates the 
advantages of using the signature of a stream as the feature set for machine learning. 

1 Introduction 

This short paper is devoted to show that the signature of the lead-lag transformation 
is a useful way to encode a multi-dimensional unstructured data stream. We aim to 
demonstrate the following points: 

1. The signature of a discrete sample stream is a rich statistics and encodes the 
essential information of data stream; 

2. The truncated signature of a discrete sample stream provides a summary in 
terms of the effect of this stream and it leads to dimension reduction for this 
original stream; 

3. The signature of a discrete sample can be used for parameter inference and 
prediction. 

The main result is Theorem |4.1[ which states that no matter how frequently the 
path is sampled, the moment of the increment process is a linear functional on 
the truncated signature up to degree p. 

2 Notation and Preliminaries 

2.1 Signatures 

Let us start with introducing the tensor algebra space, in which the signature of a 
path takes value. 

Definition 2.1 (Tensor algebra space) A formal E-tensor series is a sequence 
of tensors {an G which we write a = (ao,ai,...). There are two binary 

operations on E-tensor series, an addition -\- and a product 0, which are defined as 
follows. Let a = (ao,ai,...) and b = {bo,bi,...) be two E-tensor series. Then we 
define 

a + b = (oo -f bo, ai -\- bi ,...), (1) 
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and 

a(8)b = (co,ci,...), 

(2) 

where for each n > 0, 




Cn = ^ ^ Clk ^ ^n—k' 

(3) 


fc=0 

The product a 0 b is also denoted by ab. We use the notation 1 for the se¬ 
ries (1,0,...), and 0 for the series (0,0,...). //A € M, then we define Aa to be 
(Aao, Aoi,...). 

Definition 2.2 The space T {{E)) is defined to be the vector space of all formal 
E-tensors series. 

Similar to the real valued case, we can define the exp mapping on T{{E)) as follows. 


Definition 2.3 Let a be arbitrary element ofT{{E)). Then exp(a) is the element 
of THE)) by 

exp(a) 

n\ 

n—O 

Now we are in a position to give the definition of the signature of a path of bounded 
variation (finite length). 

Definition 2.4 (Signature of a path) Let J be a compact interval and X be a 
continuous function of finite length, which maps J to E. The signature S{X) of X 
over the time interval J is an element (1, ..., AT",...) ofT{{E)) defined for each 

n > 1 as follows 


Xn 



dX,i.^ 


® ... (g) dXu„, 




where the integration is in the sense of Young’s integral. The truncated signature of 
X of order n is denoted by S"'{X), i.e. S'”(X) = (1, X ^,..., AT”), for every n G N. 


Remark 2.1 Suppose that {ei}f^i be a basis of E, and thus for every n > 0, 
{cij®- • forms a basis of E®’’’. Therefore S{X) can he rewritten 

as follows: 


^(^) = i + E E 

n—l 



\ Ui<...<Un. 

,... ,Un G J 


The signature of a path can be simply regarded as a formal infinite sum of non- 
commutative tensor products, and the coefficient of each monomial is determined by 
its corresponding coordinate iterated integral. For every multi-index I = {ii,..., in), 
denote by the following iterated integral of X indexed by I, i.e. 

=I • • • y dxif^dxiH^... dxi^i 

Ui<...<Un 

ui,...,UnGJ 
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The first property is Chen’s identity (Theorein |2.1[ ), which asserts that the signature 
of the concatenation of two paths is the tensor product of the signature of each path. 


Definition 2.5 Let X : [0, s] —> E and Y : [s,t] —>■ E be two continuous paths. 
Their concatenation is the path X defined by 


{X * y)„ 


Xu ; ^ ^ [0 5 * 5 ] 5 

Xs + Yu — Yg, M € [s, t], 


where Q < s <t. 


Theorem 2.1 (Chen’s identity) Let X : [0, s] —> E and Y : [s,t] —E be two 
continuous paths with finite 1-variation. Then 


S{X*Y) = S{X)<S,S{Y), 


(4) 


where 0 < s < t. 

The proof can be found in [5]. 

Let {e*}f^i be a basis of the dual space E*. Then for every n G N, {e*^ C)- • -de*^} it 
can be naturally extended to (E*)®" by identifying the basis (e/ = e*^ 0 • • • 0 e*^) 
as 

(e*^ O • ■ • O e, Cj, (g) • • • O jij = 

The linear action of (E*)®” on E®” extends naturally to a linear mapping (E*)®" —)• 
T{{E))* defined by 

e7(a) = e*j{an), 

where I = (fi,... ,i„). 

Hence the linear forms ej, as / span the set of finite words in the letters 1,..., d form 
a basis of T{E*). Let T{{E))* denote the space of linear forms on T{(E)) induced 
by T[E*). Let us consider a word L = {ii,..., where *i,..., G {1,..., d}. 
Define as ej restricting the domain to the range of the signatures, denoted by 
S'(V^[0,T],E), in formula 

7T^{S{X))=e*,{SiX)), 

where X is any E-valued continuous path of bounded variation. 

For any two words / and J, the pointwise product of two linear forms and 
as real valued functions is a quadratic form on 5'(V^[0,T],E), but it is remarkable 
that it is still a linear form, which is stated in Theorem |2.2[ Let us introduce the 
definition of the shuffle product. 

Definition 2.6 We define the set Sm,n of{m,n) shuffles to be the subset of permu¬ 
tation in the symmetric group Sm+n defined by 

Sm,n = {cr G Sm+n '■ ^(l) < • • • < a{m),a{m + 1) < • • • < a{m + n)}. 
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Definition 2.7 The shuffle product of and tt*^ denoted by LU tt'^ defined as 
follows: 

TT^ lU TT'^ = ^ 

o-GSm.Ti 

where I = ’ ,in), J = ■ ■ ,jm) and (fci,..., km+n) = (ii,-- ■ ,in,ji, 

■ ■ ■ I jm)- 

Theorem 2.2 (Shuffle Product Property) Let X be a path of bounded varia¬ 
tion. Let I and J be two arbitrary indices. The following identity holds: 

7T^{S{X))7r-’{S{X)) = m 7rJ){S{X)). 


3 A discrete sampled path and the signature of its 
lead-lag transformation 

In the following we constrain our discussion on paths observed at a finite number of 
time stamps and take value in E := 


3.1 The discrete sampled path and the lead-lag transforma¬ 
tion 

Let {xn]n=i b® 3-^ increment process, where Xn G E. (You can think of it as a 
return process.) Let X := denote the corresponding partial sum process 

of {xn}nZQ. (It can be thought as a price process.) Mathematically, X is defined as 
follows: 


1^0 = 0; 

n 

^ ^ if ^ — f j ■ ■ • : 
2=1 


Now let us introduce the lead-lag transformation associated with a d-dimensional 
stream X m)- 

Definition 3.1 (Lead-Lag Transformation) LetX := {X„}^^q be a d-dimensional 
discrete sampled path. The lead-lag transformation associated with "K is a 2d- 
dimensional path which is obtained by linear interpolation ofX := {Xn}n=o! where 
Xq*^ = Xq^ and X^n-i ~ ^ Z®*' s.very n G {0,...,L — 1} and for every 

i G {l,...,d}, 


X 

X 


(d 

2 ?i -|-2 

(z-l-d) 

2n 


■^(0 _ 

^2n-Sl ~ ^n-Sl 

xL +1 = 


Let L denote the lead-lag transformation operator. 
The lead-lag process X is in the form of the following: 
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Xo, 


X 2 , 


X2n-1, 


X2n- 


/ 4“' ^ 

( 2 ) 




X 


(d) 

D 

( 1 ) 

3 

( 2 ) 


V j \ xr / 


/ xw \ 

( 2 ) 


X 


X 

X 


id) 

1 

(1) 

3 

( 2 ) 


^{d) 


( x|‘; \ 

( 2 ) 


X 


X 

X 

X 


id) 

1 

( 1 ) 

1 

( 2 ) 


V ^i'*' / 


/ X<‘I \ 


X 


( 2 ) 


-d^n 

-^n-1 
y (2) 
■^n-1 


\ AfJ, / 


/ X''> \ 


X 


( 2 ) 


y(^) 

yW 

;^(2) 


V xi'"^ J 


Lemma 3.1 (The multiplicative of the lead-lag transformation) For any two 

discrete sampled path X = and Y = {Y„}^^q 

C{X*Y) = C(X)*C{Y), 

where X * Y denote the concatenation of two discrete sampled path, i.e. 


(X*Y)„ = 


X„ 

Xl, -Yo + Yn-i 


if n < Li — 1 
if Li<n<Li + L2. 


3.2 The signature of the lead-lag transformation 

Let us define the signature of the discrete sampled stream, and discuss the relevant 
properties. 

Definition 3.2 (The signature representation of a discrete sampled stream) 

Let X be a discrete sampled path in E and X is the lead-lag transformation of X. 
The signature ofX is defined to he the signature ofX, denoted by S'(X). Let SdO^) 
denote the truncated signature of X up to degree d. Let VS denote the range of 
signatures of the lead-lag transformation of discrete sampled paths in E. 

Lemma 3.2 (Chen’s Identity for Discrete Sampled Path) For any two dis¬ 
crete sampled path X = and Y = 

S{£{X * Y)) = 5'(/:(X)) (g) S'(/:(Y)). 

Definition 3.3 (Additive functional on VS) Let K be a linear form onT{{E)). 
We say that K is additive in VS if and only if for every S'(X), 5'(Y) G VS, it follows 
that 


K{S(K * Y)) = K{S(K)) + K{S{Y)). 

For convenience, let us adopt the following notation 

(p) 

Definition 3.4 Fix any positive integer p. Let JCf denote the set of the linear 
forms on T{{E)) such that it can be written as 

iji=p,j=(ji,/) 
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where Cj are all constants and the summation is taken over all J such that J is of 
length p and ended in the substring I. 


4 One Dimensional Stream Case 


Let us focus on one dimensional case, and we will show that the signature of X con¬ 
tains rich information of the path X and it is a good basis function to represent the 
standard statistic, for example, the empirical moments of increments of X (Theorem 


4.1). Let us start with discussion on properties of the signature of X. 


By Chen’s identity and simple calculation, the signature of a path in TiS can be 
given so explicit as follows: 


Lemma 4.1 (Signature of one-dimensional discrete path) For any X G VS, 

and is the increment process associated with X, then 


L 

'S'(X) = (^exp(x iCi) (g) exp(xie2) 


Lemma 4.2 For every index I ending in 2 and any positive integer p, there exists 
K G for any X^ G VS, such that 

n^hM,)^S(XL)=K{S{XL)). 


where Mp is p copies of 1. 

For every index I ending in 1 and any positive integer p, there exists K G 
for any X^ G VS, such that 

n(hK,)^S{XL) = K{S{XL))- 

where Mp = (1,..., 1), i.e. p copies of 1. 

Proof. First of all, let us prove that the case p = 1. As / ends in 2, then we can 
rewrite I as (J, 2). Since — 7r(^))(S'(X)) = 0, then 

0 = -b LU 

= Tt" m ^(2) _ ^(Jml,2) g 

Then we prove this statement by induction on p. Let Kp be p copies of 2s. 

Let US investigate the term 7r(^'-‘-''^p-i4)^ 


p-i 

(/LijMp_i,l) = (/,Mp)-b^(jLijMfc,2,Mp_fc), 

k=l 


and thus 


p-i 

k=l 
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For any k = 1,... ,p — 1, by induction hypothesis, there exist the linear functional 
G G S'^ch that for any S'(X) S VS, 

= G{S(X)). 


Therefore 

p-i 

k=l 

= TT^ LU TT^*’ - _Q^ 

Now we complete the hrst part of the statement. We can use the same strategy to 
show he second part of the statement. ■ 


Remark 4.1 Since = n^p, Lemma 4-2 shows that for each index I, can 
be rewritten as a linear functional in 

Lemma 4.3 For any index I = {ii,..., 2), and any S{^l) G VS, 


i=i 


(5) 


Proof. We show this lemma by induction on L. For L = 1, both sides of are 
equal to 0. By Chen’s identity, for L > 1, it follows that 

Tr^^’^\S(XL)) = 0 SiXL-i.L)) 

= 7r(^’i)(5(XL-i)) + 7r^^'>iSiXL-i))xL 


because 

S0^l-i,l) = exp{xLei) 0 exp(xLe2) 
Then it follows by the induction hypothesis that 

L-l 

i-1 

L 

t=i 


Lemma 4.4 For any index I = (ii,... ,in-i,2) and k > 1 there exists a linear 
functional F depending only on I and k, and F G such that for any S(Kl) G 

VS, it holds that 


L 

F{S{XL))=J27r^{S{X,.,))x^. 

3 = 1 


( 6 ) 
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Proof. For fc = 1, it is proved in Lemma [4.3| Assume that k < K — \ \s true. Let 
us consider the case where k = K. 




i=i 


k-j 


{k-j)V 

After rearranging the above formula we have that 


k-l 




By telescope sum of the above equation, we have that 

L 




L k-l 


k—j 

r» '' 


2=1 \ 3^1 

^k-1 


= . 




By Lemma 4.2 there is a linear functional G depending on {I 2 , 1*-^) and k — j, such 
that 


TT 


(L,l*^) _ Q 


Then by induction hypothesis, 

2=1 

can be rewritten as a linear function on can be rewritten as a 

linear functional in so is ■ ^ow the proof is complete. 

■ 

Lemma 4.5 Let Li G and Li is additive, then there exists Li G such 

that 

n 2 

Li(5(X„)) = -^Li(5(X),)^; 

2=1 

Proof. Let Li := For n > 1, it holds that 

7r(^i'2.i)(^(X„)) 

= 7r(^i’2.i)(^(x„_i)®5(X„_i,„)) 
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|t~"( ??■ 







Similarly we have 

+7r(^i)(5(X„_i))^ + 
+ 7^/1 (S'(X„_i), Xn). 


where 


7^7,(5(X„_l),a;„) = 7r(^)5(X„_i)^^n^(X„-i,„)) 

Y, 7r(^)5(X„_i)cj,a;L^il. 

J»Jl=/l,J5^0,Jl/0 


The last equality comes from the fact that 


7r(^^’^’2)(5'(X„_i,„)) = 7r(^i’^’^^(exp(a::„ei) (g) exp(a:„e2)) 

= 7r(^i^(exp(a;„ei))7r(^’^)(exp(x„e2)) 


By Lemma 4.4 there exists a linear functional Gj^ on such that 


Gq(5(X„)) - Gq(5(X„_i)) = 7^q(5(X„_i), a;„). 


Thus it follows 

^(/l.2.1)(^(X„))-7r(^i>2-2)(5(X„)) 
+ G(5(X„))-G(^(X„_i)). 


(5(X„_i,„))) 


X 

~2 


where 

G(^(X„)) = ^G7,G/,(5(X„)). 

h 

Then following the notations 

Ll = ^ Ci, _ ^,■(11,2,2)'^ _ Q 

h 

fn = L(S(X„)) 
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S to 



and it is obviously hat /(O) = 0. Moreover since Li is additive, then Li(5'(X„_i)) + 
Li(S'(X„_i_„)) = Li(S'(X„)), and it follows 

h 

= /„_i-(Li^(X„_i) + Li,5(X„_i,„))^ 

= /„_i-Li,5(X„)^. 

By the telescoping sum o /„, it holds that 

n n 2 

fn = - /-i) + /o = ■ 


Theorem 4.1 (p-moment) For any integer p > 0, there exist two linear function¬ 
als G k}'i \ and G IC^\ such that for every path X, the following equation 
follows: 


N 


LW(yX))=Lfy(X)) = ^a:f. 


(7) 


Z=1 


Obviously if ^ is true, then and are both additive. 


Proof. Let’s prove it by induction on p. It is true for p = 1,2. Suppose that it 
holds for p < P. Let us study the case when p = P. 

N 

N 




2 = 1 
N 


N 


= E 4-2 - E 4-2(5(x)._ 


ix. 


2 = 1 


2 = 1 


4.5 


since 44 i® additive, then 44(*^(X)i-ia:i can be rewritten 


By Lemma 
as a linear functional G G such that 


N 


Giy(X)^)=E4-2(^(X)*x2. 


( 2 ) 

By Lemma 4.4 it follows that there exists G 2 € /Cp , such that 


N 


G2y(X)^) = E4-2(^(X),-ia;?. 


2=1 
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5 Multi-Dimensional Stream Case 


The following lemma states that the empirical covariance of the increment of a 
multi-dimensional data stream can be fully characterized by its signatures. 

Lemma 5.1 Let X := {Xn}n=i be a d-dimensional discretely sampled stream, 
{xn}n=i be the associated increment process and X be the corresponding lead-lag 
process ofX.. For any ii,i 2 G {1, • ■ • ,d}, there exists a linear functional L such that 

L 

n—1 

Proof. For the case that ii = i 2 = i: it holds that 

L 

^(a;W)2 ^ 7r(*’*+‘^)(S'(X)) - 7r(*+‘^’*)(S'(X)) = 2(7r(*’*+‘^)(S'(X)) - 

n—1 

as 


_|_ ^{i+d,i) _ ^(i)^(i-\-d) _ ^(i)^(i) _ ^(i)Lij(i) _ 

For the case that ii ^ 12 , the signature of the path which is the (^ 1 , 12 ) 

coordinate projection of X is given as 

L 

5 '(X(n,* 2 )) ^ (^exp 
n—1 

then it follows that 

ni<n 2 n—1 

the signature of the path which is the ( 11,^2 -bd) coordinate projection of 

X is given as 

L 

<S'(X(*i’*"+‘^)) = (^exp (g) exp ei^+d^ ( 8 ) 

n—1 

then it follows that 

^b,.. 2 +-i)^(X) = ^ 40 ) 3 ;^) + ^ (9) 

ni<n 2 n=l 

Combining Q and , it follows that 

L 

= 2(7r(*i’*^+‘^)(S'(X)) -7r(*i’*^)(S'(X))). 

n=l 
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Lemma 5.2 Let X := {Xn^^^i he a d-dimensional discretely sampled stream, 
{xn}n=i he the associated increment process and X he the corresponding lead-lag 
process ofX. For any pairwise different ii, 12,43 € there exists a linear 

functional L such that 

L 

il) Xi2) Xis) — 

I '^n 

n—1 

Proof. The signature of the path which is the (* 1 , 42 ) 43 ) coordinate 

projection of X is given as 





MM) 


(il,i2,i3+d)'^ _ ^(ii+d,i2,i3+d) _ 


^{iiM+dM+d)'^ (5'(X)). 


L, 

S'(X(®i’®2’®3)) = (^exp (^xl^^'>ei^ Ci^ 


then it follows that 


^in,^2,3)s(X) = Y 2 + 2 E 

ni<n2<n3 ni<n2 

L 


^9 ^ ^n2 ^ g ^ni • 


ni <n2 


The signature of the path x('‘i-d 2 -\-d,i 3 +d) ^ which is the (* 1 , 42 , * 3 ) coordinate projection 
of X is given as 


L 

g(x{^l,^2+d,^3+d)^^ ^ ^ 0 exp (x^f^'>ei^+d + • 

n—1 


then it follows that 

^{iiM+dd3+d)g(^^^ = 


^{ii) 

*^ni ‘^n2 ' 2 

<'^2<?^3 ni<n2 


) E 


rr{^2) ^{is) 

•^ni ‘^n2 *^n2 


I Y^ ™(*l)™(^2)_(^3) I_Y^ ™(*l)™(i2)_(^3) 

~ / ^ ^ni ^ni ^n2 ~ 9 / > ^ni ^ni *^ni * 


ni <n2 


ni—1 


Similarly we have that the signature of the path X(®®’® 2 >® 3 +‘^), which is the ( 41 , 42,43 + 
d) coordinate projection of X is given as 


S'(X(' 


*1,42,43+14)1 _ 


) = <S> (®^P (g) exp 


and thus it holds that 

^(il,i2,43+<i)(5'(X)) = 


E T.(4l)3,(42).^(i3) I 
•^r7-i ‘^r?.o *^r3.o ' 


E 


2^(n)^(i2)^(^3) 
•^ni ‘*^n2 ^n2 


■'ni ^n2 ns 
ni<n2<n3 ni<n2 

1 1 ^ 

I _ Y^ ™(n)™(*2)™(i3) J_Y^ 2^CM).^(^2)™(i3) 

^ 9 / . *^ni ‘^ni ‘^n2 ^ 9 / > ‘^ni •^ni ‘^ni * 


ni<n2 


ni — 1 
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Moreover we have that 




E 


^{ 12 ) rpiis) 1 rri^l) ^{i2) ^{is) 

^ni '^712 ‘^713 ^ ^Tli ^712 '^712 ' 


'fT-1 <n2 <713 


Til <712 


Combining the above equations, it follows that 


E 

71—1 


^(il)„(i2)„(j3) _ 

in ^ n n 




. j^(il,i2,i3+d) _ jj.(il+d,i2,i3+d) _ ^iii,i2+d,i3+d) 


)(5(X)). 


6 Numerical Examples 

6.1 Toy Example 1: Correlation estimation 

In this toy example, we want to demonstrate that the signature of a stream can be 
used as a basis function to represent standard statistics, for example, the mean and 
the covariance matrix of the increment process. 

Example 6.1 We simulate 400 samples of the pair {pn, where pn is 

iid and uniformly distributed in [0,1], and for each p„, Xp^ is generated as a 2- 
dimensional random walk of length L with the correlation pn, he. 

"))■ 

How can we estimate the model parameter p for each sample path? 

Our method is simply to do the linear regression of the correlation parameter against 
the truncated signature of the sample path. To better judge the performance of our 
method, we used the empirical correlation as a benchmark. The empirical correlation 
for each sample path Xp is defined as follows: 



Some parameters I chose are given as follows: 

L = 120, TV = 200, d = 3 

Figure shows that the empirical correlation is better in terms of MSE, especially 
when p is near +1 an —1. However due to the nature of polynomial regression, the 
signature-approach perform worse when p is near the boundary. However the reason 
why the signature approach is not satisfactory is not because that the truncated 
signature do not include enough information of the path. Instead the reason is 
that the regression method we used is too simple and it should be combined with 
advanced non-linear regression techniques, e.g. rational regression or some local 
regression methods. Theoretically if properly combined with advanced regression 
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Figure 1: The plot of the empirical cor- figure 2: The plot of two estimated cor¬ 
relation v.s the actual correlation relation against the actual correlation. 


techniques, we should be able to recover the empirical correlation. It is because 
that by definition of the signature of a stream, Lemma |5 .1 1 shows that the empirical 
covariance/variance of the increment process is a linear combination of the truncated 
signature up to degree 2, and the ratio of the empirical covariance and the square root 
of empirical variance of two coordinate increments gives the empirical correlation. 


6.2 Toy Example 2: Using signatures to classify two classes 
of random walks 

Example 6.2 Let X denote a standard 3-dimensional random walk of length L, 
and Y denote the other random walk, where are independent and move 

to -|-1 and —1 with probability 0.5, but . Given one realization of a 

random walk of length L generated either by the distribution of% or that o/Y, whieh 
distribution this realized path is from? 

In this example, we can’t distinguish which distribution one sample path is gener¬ 
ated from by looking at its empirical mean and covariance matrix of the increment 
distribution, it is simply because that 

E[a;] = E[y] = 0; 
cov[a;] = cov[y] = I 3 . 


But we can almost perfectly classify this sample path using the truncated signatures 
in this case. We summarize the procedure as follows: 

1. We simulate N paths based on the distribution of X and Y respectively. 

2. Compute the truncated signature of those sample paths up to degree d. 

3. For each sample path X, let the response variable define in the following way: 


fiX) = 


if X is sampled from X; 
if X is sampled from Y. 
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4. We randomly select half of the dataset as the learning set, and the rest data as 
the backtesting set. Apply SVM classification method to /(A) against S{X)d 
in the learning set, where d = 3. 

5. After obtaining the classifier /, for any new given path A*, by plugging it to 
the classifier /, the estimated class of X* is given by f{X*). 

In this example, we choose N = 200, L = 100 and d = 3. The incorrect selection 
ratio is 1/400, and it means that there is only one mis-classification for the whole 
dataset of size 400. It is noted that the sample space of Y is actually the subspace 
of the sample space of X, and theoretically if X is in the sample space of X, its 
category is not distinguishable from this sample path trajectory. 
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